DOCOaSHT HBSDME 



ED 190 597 

TITLE 



.INSTITUTION 

SPONS AGfiNC? 

PEPOPT NO 
POB DATE 
GBANT 
NOTE 



EDBS PRICE 
DESCRIPTORS 



IDENTIFIERS 



- T* BOO 3B5; 

.. , > ■ 

Huyith, Hu/TiK: Saunders, Joseph, C, III 

Bayesian and Empirical Bayes Approaches to Setting 

Passing Scores on Mastery Tests, Publication Series 

in- Mastery Testing. * _ -' 

South Carolina On iv., Columbia. Schpol of 

Education. ■ ' ^ 

National Inst* of Education (DHEW) , Washington, : 

D.C. ' - • ' 

RM^9-2 " ■ ^ 

Apr 79 . ' ■ * 

NII-G-78-00B7 

17p,: Paper presented at the ioint Annual Meetings of 
the American Educational Research Association and the 
"^National,. Council on Measurement in Education (San 
Francisco, CA. April.-8^12, 1979). 

MF01/PC01 Plus Postage. ' 
♦Bayesian Statistics; ^Cutting Scores; Gfade 3; 
♦Mastery Tests: Minlauo competency Testing: Primary 
Education; *S9oring Formulas; T:^ue Scores 
Binojiial Error Ho.del: Comprehensive Tests of Basic 
Skills; South Carolina Statewide Testing Program: , 
Test Length * 



/ 



ABSTRACT 

The Bayesian approach to setting passing scores, as 
proposed by Swaminathan, Hambleton, and Algina, is compared vitTa. €:he 
empirical BSyes approach to the same problem that is derived ftpm . ; 
Huynh»s decision-theoretic frameworlc. comparisons are tmsed on 
simulated data wiiich follow an approximate beta-binomial distribution 
and on real test results from the Comprehensive Tests of Basic Skills 
administered in the South Carolina Statewide Testing • Program, Both 
procedures lead to setting identical or aluost 'identical _^passin<^ 
scores as long as the test scor^ distribution is reasonably symmetric 
or when the ninimum mastery level or criterion level is high- Larger 
discrepancies tend* to occur when this level is low, especially when 
the distribution of test scores is concentrated at a few extreme 
scores or when the frequencies are irregular. However, in terms of. 
mastery/nonmastery decision, the two procedures result -in the saiae 
classifications in practically all situations. The empirical Bayes" 
procedures may be used for tests of any length, while the Bayesiarf. 
procedure is recommended only for tests of eight or more items. 
Further, the empirical Bayes can be generalized and applied' to more" . 
complex testing situations with less difficulty than the Baye^sian 
procedure. (Author/CP) 



*>|(*Jitf*********'******^******«* *************** ****** ****** 

*' ' Reproductions supplied by EDRF are *he best that c^n be made ♦ 

♦ from the original document. • / . * 

^♦♦♦********#****************^********^***************^.** ***♦♦***♦♦#**♦ 



\ 



PUBlXCAy:'XON SERIES IN MASTERY TESTING 
UniVerjsity of South Ceurolina 
College of Sl^ucation' 
Columbia, South Carolina 29208 



us DEPARTMENT OF HEALTH. 
EDUCATION AWELFARE 
NATIONAL. fNSTlTUTE OF 
EOUCATJON 

This DOCU'V^t^T, ma^ BfcfcN REPRO- 
DUCfO rXACTLY AS RfCriVfO FROM 
fHE PERSON 0» ORCrAN I NATION OWfGlN- 
ATING ! T jPO! NT^ Of V>b W OS? OPINIONS 
STATfcO DO NOT NE C SSAR fl V Rg PR E- 
SP NT Of F ic: 1 At NAT lONAL I NS T 1 T U T E. Of " 



EOUCATiON f'^OSiTtON Of^POL)C*Y 

Research Memorandum 79-2 
April, 1979 




BAYESXAN AND ^EMPIRICAL BAYES APPROACHES TO SETTING 
:^ASSINC? SCORES ON MAStERY TESTS 



iHtiynh Huynh 
' jQsaph C. Saunddrs III 

University of South Carolina 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



Presented at the symposi\m "Psychometric approaches to domain^ 
referenced testing" sponsored jointly by the American Educational 
Research Association and the-'National Council on Measurement in Edu- 
■cation at their annual meetings in ' San ikancisco, '-April 8-12 , 1979 > 

. . . ABSTRACT 

The Bayesian approach to satiing passing scores as proposed by 
Swamlnathan, Haobleton, and-Algina is comparad with the empirical 
Hayes approach to the s&sm. problem that is derived from Huynh's 
decisiou'-theorttic framework. Compatisons are based otx sinuiatad 
data which follow an approximate beta-biaomial distribution and on 
real test data sampled from, a statewide testing program. It is 
found that the two procedures lead to setting identical or almost 
identical passing scores as long as the test score distribution is 
reasonably symmetric or when the minimum mastery level or criterion 
level is high. Larger discrepancies tend to occur when this level 
is low, especially ;*hen the distribution of test scores is concen- 
trated at a feW extreme scores or when the frequencies are irregu- 
lar. However, i;i terms of mastery/nonmaatery d^isions, #thft two 
procedures result' in the same classifications in practically 'all 
situations. However, the empirical Bayes procedure may be uded for 
tests of any length, while the Bayesian procedure is recownended 
only for tests of 8 or more items. Additionally, the empirical 



This work was performed . pursuant to Grant No. NIE-G^JS-OCS? with the 
National Institute of .Education, Dispartiifent of Health, Education,- 
azid Welfare, Huynh Huynh, Principal Investigator. 



2 ■ / ' HUYNH & SAUNDERS 

Bayers procedure c^n be generalized and applied to more complex 
testing situations with less difficulty than the Bayesian procedure. 



1> INTRODUCTION ' ^ ' 

Among the man^ decision- theoretic approaches to setting pass-- 

ing scores (or standards) for mastery tests, there are at least two 

If. ^ . 

ijiethods which rely on test dat^ collected from a group of examinees. 
The Bayesian procedure, as piresented in Swaminatjian, Hambleton, and 
Algina (1975), assumes that prior knowledge regarding the examinees 
\| is exchangeable (Novick, Lewis* & Jackson, 1973) and can be quanti- 
fied in some appropriate manner. On the other hand, the empirical 
. ^B^yes approach, ag* formulated in Huynh (1976a), uses only the true 
. f ability distribution of the ^icaminees and makes no assumption re--* 
j garding prior- knowledge about the examinees. Both procedures use 
' testAdata collected from a group of examinees and establish passing 
' scores for mastery tests by minimizing certain loss functions. The 
I purpose df this paper is to present a comparison, of the two sets of 
standards (passing scores) fojnnulated under a variety of conditions - 
which^can be .expected to be enoountered in ma&tery testing or in 
minimum competency testing. The comparison will be made first on 
t^?« basis of approximate beta-binomdal test scores. Further com- 
parisons willN^ madfe using the Comprehensive Tests of Basic Skills 
(CTBS, 1973) datXcollected in the 1978 South Carolina Statewide - 
Tasting Program. 

. AN bVERVIEW OF THE BAYESIAN AND 



EMPIIil CAL BAYES APPROACHES 

Overall Framework 

Th^ Bayesian framework as presented .by Swamina than • e t ' al . and 
the spee^ial -empirical Bayes procedure described in Hujr^ (1976a-, 
p. 70-73) start with a typical four-corner setup used In decision 
theory. (See Figure I> p« 16, for the b^Mc elements of this setup •) 
Let 9 (tt in the notation of Swamln'athan e_t al_. ) be the true score (or 



ERIC 



. BAYESIAN & EMPIRICAL PASSING SCORES • ^ 

trjie ability) of an examinee anci x be the observed , test score as 
obtained from an n-item test. For the binomial error model adopted 
in both standard setting approaches, 6 is the proportion of items 
- in a real or hypothetical item.ppol^ that an examinee answers cor- 
j ' rectly. Let a person be called a master if that person '.s true 

score 6 is such that 9 > 8 and a nonmaster if 0 < 6^. Here, 9„ is 
a given constant which defines the lov^r bowidary of the inastery 
level or the cr;tterion level.. Since a persoti^$ true score cannot 
be observed directly, decisions about whether to call the person a 
# master must be based on an observed test &core« _ What remains to be 
determined is the cutc^f score c that will be in some sense optimal. 

On the basis of the test score x, a person is called a master 
if x\> c and a nonmaster if x < c. A correct decision is made 
whenever either (a) 9 > Q and^^ > c, or (b) 8 < 9 and x < c. 
Otherwise, either a fals6 positive error (9 < 9^ and x >^ c) or a ^ 
false negative error; -(0 >^ 9^ and x < c) is encountered. 

In the case where the loss j^ssociated with each error is ton- 
\ atant, generality is not diminished if we let the loss incurred by 
a false positive -error be equal, to l^^nd that associated with a 
false negative error be equal to Q. Here, Q expresses the ratio of 
the false negative error loss to the false positive error loss, . 
* \ (In the notation of Swaminathan et al,, Q ^^21^^12*^ 
Bayesian Approach ' j . 

^ \ ^ flow lefc^ an n-item If est be given to m examinees • In the Bay^s- 

ian procedure as iii^jletoented by Swaminathan et al . , the prior in-- 
formation regarding the examinees is assumed to be exchangeable 
(l.e:«, prior knowledge regarding one examinee can be interchanged 
with that associated with another examinee without causing any dis-- 
turbance in the decision, problem) . The model requires knowledge . 
) (prior belief) of the distribution of the variance of true scores 
for the group. (Ifa point of fact, an arcsine transformation of 9 
is used.) This prior distribution Is taken to be the inverse chi- 
souare distribution with parameter X and degrees of freedom v. A 
recommended choice of v is 8 (Novick, £t al. , 1973). 



ERJC 4. 



4 • HimJH & SAUNDERS 

To assess X, let t be the tiumber of test* items which would 
need to be administered to a typical examinee in otder to obtain as 
much information about that examinee*^ €1 as already have. Then, 
I « 3/(2t+l). , Wang (1973) has tables to facilitate computation in 
thia procedure. In the setup of(!>the Wang tables, X/v is chosen as 
.01, . .02, .03, .04, anil .05'. These rat;los correspond, to the t val- 
^ ues of 18.25, 8.875, 5.>5, 4.1875, and 3-25. Given the prior infor- 
mation as revealed through X and v and the ta^t data of m subjects, 
it is possible via the Wang tables to compute th4 two ejected 
losses: Pr(8 < 8^ | test data) and Q*Pr(8 > 8^ | test data)' at 
each test score. A Bayesian passing s^core is then' the smallest 
score at which^the first expected loss is smaller than the second 
one. More details may be foiind in Swaminathan ^ £l . (1975) and 
in Novick et al. (1973); . , 

Empirical Bayes Approach 

The empirical. Bayes solution assumes that\the m examinees 
, constitute a random sau^le from a population for which the true 
ability & follows a known distributional* form- such as the beta 
density with parameters a and 3 (Keats. & Lord, 1962, page 68). 
Sample test data are used %o obt;ain the estimates a and S, and the 
results are used to co^ute the probability of a false positive 
decision Pr(8 < 8^, x >_\0^and of a false negative decision 
Q*Pr(e >^ 8^, < c) at a giv^jTcutoff score c. The optimum passing 
score (henceforth referred to simply as the passing scora ) will be 
the value of c at which the a^verage loss, Pr(9 < 8^, x >^ c) 
+ Q*Pr(9,> 9 , X < c) , is the smallest. 

The procedure is implemented as follows. Let x and s be the 
mejan and standard deviation of the test acores, and let the Kuder- 
Richardson reliability coefficient be defined as 
n 



^21 -n-l 



^ ^ x(n-x) 



2 
ns 



Then 



• a - (-1 + l/a23^)x 

and 



BAYESIAN- &■ EMPIRICAL PASSING SCORES , 5 

B - -a + W*2i " ^' 
For test scores 'with. insufficient variability, may be negative. 
If this occurs simply replace a^^^ by the smallest positi-vee relia- 
bility estimatje which happens to be available. Let I denote the 
incomplete beta ftmction as tabulated in Pearson (1934) and imple- 
mented via computer programs such as the IBM Scientif;[,c Subroutine 
package (1^71) ot the IMSL (1^77). Then the passing, score is the 
smallest integer c, at which' > , . 

I(a+c,afS-c;e^) < Q/(1+<J): (1) 
A normal' approximation is available if there is a sufficiently 
large number of items and if 9^ is not n^r 0 or 1. Let i denote 
the 100/ (1+Q) percentile of the unit normal distribution. Then the 
tes^t passing score is nearly equal to 



c « (n+a+$-l)9^ + e 



^ -. i + .5. ' ' (2) 



(n-f<i+e-l)9^(l-9^) 

'The data presented in Huynh (1976bP indicate that the passing score 

comput.ed from Equation (2) does not differ appreciably from the one 

deduced from Ine<iuat ion (1) when the test consists of 20 items and 

when 9 is within the range from .50 to- .80. 
o 



3. 'A CO>gARISON OF BAYESIAN AND EMPIRICAL BAYES 
*• . '■ PASSING SCORES FOR APPROXIMATE 

BETA-BINOMIAL TEST DATA 

The passing score obtained via the empirical Bayes approach, 
as revealed by Inequation (1) , is based on test score data that 
follow a beta-binomial distribution. It may be of interest; to 
compare the Bayesian approach to setting a passing scpre with the 
empirical Bayes approach, using test data which follow closely a 
beta-binomial form. * * 

Both the ptesent compari3t)n and the one detailed in the next 

section are Jsased on tests with ten items. In these comparisons, 

the criterion or minimum* mastery level is set at 9 ■ .60, .70» and 

.80. The loss ratio, is chosen to be Q « .25, .50, A. 00, and 2.00^ 

(A loss ratio sma^lier than one indicates that a falSe positive 
« 

error is less serious than a false negative error.) To compute a 

passing score via the Bayesian approach, it is necessary to specify 



6 . HtlYNH & SAUNDERS 

the ratio X/v or, equivalently, the quantity t as described in 
Section 2, It may be rjscalled that t may be interpreted as the 
number ofW'test items" which are believed to be as infofinative as 
the prior belief about the e?«minee§. In practical situations Xn- 
volving standard setting, it seems unreasonably to let the priSr, 
belief a; carry as much weight as the objective test data* In other 
words^ it is unlikely that t is tog close to n/ Thus for the 
comparisotis based on 10-item tests reported iji t;his section and in. 
Section 4 as well as the comparisons based on 20-item tejits 
described in Section 5, the t-'Values are chosen to ^ 8.875 
a/.v - .02), 5.75 (X/v - .03), 4.1875 G/v - .04), and 3.25 ; 

(x^/v - .05). ; " , 

The first five test score frequency distributions (labeled Al 
through A5 in Table 1) aerve as the^ data base for the comparison of 
the passing scores computed by the two procedures using test score 
distributions that are approximately beta^binomial. Each^ is delib-- 
erately chosen (i) to yield an s value (variance of the arcsine- 

sqtiare-root transformation of the test scores) confortaing as closely 

2 ' * " • 

as possible to the tabulated s values of the Wang tables (so that 

$ < * 

no interpolation would be necessary) and (il) to reflect several 
degrees of skewness and variability thought to be typical of mas-- 
tery testing sijiuations. (Also in Table 1, and explained below, 
are distributions of actual te^gft scores from the South Carolina 
Statewide Testing Program* ) It may be noted that* in Table 1, the 
quantity D(%) represents the maxii^ium per^gjK difference between 
the observed and beta-binomial-f itited d!luiiu\ative 'frequencies. A 
small D-*value indicated a good fitl 

Table 2 reports the Bayesian oasaing scores and the corre- 
sponding empiriiial Bayes passin^jj^dores (in Italics) for several 
combinations of 9 , Q, and t. The kata indicate that for the situa- 
tions under consideration, the Bayesian and empimcal Bayds passing 
scores are .identical, or nearly so, las long as the test score dis- 
tribution is reasonably symmetrical l( Cases 'A2, A4, and A5) ♦ For 
highly skewed distributions (Cases AB. and A3) the ^o passing 



BAYESIAN .^ E}i>IRICAL PASSING SCORES 



TABLE 1 



Frequency Distributions of Test Scores Used 
. * in Comparisons o,f Passing Scores 



Data Source/ 



Skew- 



FreQt^^ncy -at score o 



r 



Set 


Sul>test ' 


m 




S.D. 


ness 0 


1 


2 


3 


4 


5 


6 


7 


8 


9 10 




App'roidinate ] 


Beta-Binomial 
















< - 

6 






Al 


Fictitious 


40 


^3.1 


1.36. 


-0.61 . 










1 


.3 


8 11.11 


A2 


Fictitious 


80 


1.0 


1.87 


/-O'.Sl 




1 


3 


6 


10 


13 


16 


15 


11 5 


A3 


Fictitious 


40 


- 1.2 


1.01 


-1.51 












i 


2 


4 


10 23 


A4 


Fictitious - 


40 


1.6 ■ 


2.01 


-0.02 


. 1* 


3 


5 


6 


• 7 


7 


5 


4 


2 0 


^A5 


Fictitious 


40 


1.0 


2. 15 


0.12 1 


3 


5 




7 




5- 


4 


2 


1 *0 




Coisprehensive Tests of Basic Skills 




• 














•> 




Mathematics 






























concepts and 


















■ \ 


V 






A 7 




applications 


^20 


W 


1.28 


-0.63 . 












1 


6 


B2 


Mathematics 




























computations 20 


9.2 


'1.45 


-0.24 












3 


-4 


3 


4 6 


B3 


Spelling 


20 


6.1 • 


1.76 


-1.04 ■ 








2 


0 




2 


6 


4 5 


B4 


Social 






























studies 


40 


6*2 


2.11. 


. 0.27,- 


' 1 


4 


5 


9 


5 


5 


6 


3 


1 1 


B# 


►Language 


























3 '2 




expression 


40 


8.2 


1.86 


-0.53 




1 


1 


5 


3 


4 11^0 


36 


Reading 


40 


4.1 


1.22 


-2.12 










1 


1 


2 


3 


3 30 


B7 


Science 


60 


5.6 


1 . 74 


-0.22 






2 


6 


10 


8 


14 


8 


12 0 


B8 


Reading 


























16 29 




vocabulary 


60 


3.2 


1.56 


-1.75 






1. 


0 


3 


1 


5 


. 5 


B9 


Reading 


























23 30 




vocabulary 


80 


2.7 


1^68 


-1.49 






2 


1, 


2 


5 


t6 


11 


^BIO 


Spelling 


80 


2.1 


1.50 


-1-44 






1 


0 


2 


4 


7 


12 


16 38 



total number of scores in the dil'tribution . 



. D(%) represents the maximum percent difference between the observed 
and beta-binomial-fitted ciitoulative frequencies. Ail are not sigr 
nif leant ^it the ten percent level of significance. 

scores rarely differ by more than one unit when the criterion level 
8 is relatively high' (.70 or .80) and when \/\> is such that t is 
not too close to n, say when, X/v is at least .03. Large discrepan- 
cies, however, may occur at a low criterion level such as/. 60 or 
when t is close l;o..n. 



8 . 



HUYNH & SAUNDERS 



TABLE 2 

Bayeaian and Empirical Bayes Passing Scores for Five 
Approximate Beta-Binomial Test Score Distributions 

' Bayesifin (at X/v • .02, .03, .04, ,05) 

Data and empirical Bayes (in italics) at 

Set Q - .25 . Q - .50 Q - l.QO * Q - 2. 00 

Al .60 4, 5, 6, 6, 4 3, 4, 5, 5, 2 2, 3, 4,^4, 1 1. 2, 3, 3,"i) 
.70- 7, 8, 8, 8, 6 6, 7, 7, 7, 5 5, 5, 6, 6, 4 4, 4, 5, 5, J 
.80 10,10,10,10, 9 ■ 9, 9, 9, 9, 9 8, 8, 8, 8, 7 7, 7, 7, .-7, 6 

A2 .60 7, 8, S,. 8, 7 6, 7, 7, 7, 6 5, 6, 6. 6, 5 ^4, 4, 5, 5, 4 - 
.70 10,10,. 9, 9, 9 9, 9, 9, 9, 9 '8, 8, 8, 3, 8 7, 7, 7, 7, 7- 
.80 10,10,10,10,10 10,10,10,10,20 10,10,10,10,20 9, 9, 9, 9, 9 

A3 '.60 1, 3, 4, 4, -3 .1, 2, 3, 3, 2 0, 1, 2/ 2, 2 0, 1, 1, 2, 0 
.70 4, 5, 6, 6, 6 3, 4, 5, 5, 5 .2, 3, 4, 4, 4 1, . 2, 3, 3, 3 
, .80 8, 8; 9, 9, 3 7, 7, 8, 8, 7 5, 6, 7, 7, 6 4, 5, 6, 6, 5 

A4 .60 9, 9, 9, 9, 9 .9, 8, 8, 8, 5 8, 7. 7, 7, S 7, 6, 6, 6^-^ 
.70 10,10,10,10,20 10,10,10,10,20 10, 9, 9, 9,20 9, 9, 8, Sf ^ 
. .80 10,10,10,10,10 10,10,10,10,20 10,10,10,10,20 10,10,10,10,20 

A5 .60 10,10, 9, 9,2.a 9, 9, 9, 9,^9 8, 8, 8, 8, 8 7, 7, 7, 7, 7 
^ .70 10,10, 10,'a.0,iO 10,10,10,10,20 a^CUlO, 9, 9,20 9, 9, 9, 9, 9 
.80 10,10,10.y,20 10,10.10.10,20 ioTferOt 10, 20 10,10,10,10^-20 

>,•.■ ■■■ * ' • ' " 

4. A COMgARXSON OF BAYESIAN AND EMPIRI CAL ' 
BAYES PASSING SCORES FOR CTBS TEST DAW 
I • ' . ■ 

This phase, of -the study is based on a 10% systematic, sample 

of the entire tihird grade CTBS-Level C data file compiled during the 

1978 South Carolina Statewide Testing Program. To obtain the fre- 

quency distributions labeled as Bl to BIO (in Tables 1 and 3), the 

following procedure was used. First, ten 10- item subtests were 

assembled by random selection of items frpm each CTBS subtest. 

Next, for each 10-itdm subtest, a frequency distribution was con- 

str(ucted for each school district which had* at ^ least . 20 students in. 

2 

the systematic saaq>le, and the corresponding s value was obtained. 

2 ® 
(The s values were distributed as follows: .10 to .50 (32%), .51 
' g .. ,• " 

to .75 (38%), .76 to 1.00 (20%), a^ more than 1.00 (10%). Urge 

2 ^ ' ' * 

3 values tended to associate with subtests dealing with readi"ng 

coB^rehenalon (sentences or paragraphs), language expi^ession," and 

lariguage mechanics.) Third, among the frequency distributions with 

a values included between .01 and ;05, tan were finally selected 
g • ' 



9 , 



BAYESI^N & Ei^IillCAL PASSING^ SCORES 9 



'and altered slightly so that the total number of examinees (m) was 
exactly 20, 40, 60, or SO. 

Table 3 lista the Bay-esian and empirical Bayes passing scores 
under a variety of -conditions. As in^thfe previous section,, the data 

TABLE 3 

^ Bayesian and Empirical Bayes Passing Scores 

. for Tetj CTBS Test Scqre Distributions 

I . ^ 



Bayesian (at X/v - .02. .03, .OA, .05) 
Data and empirical Bayes (in italics) at 



Set 


o 




Q - .25 






- .50 






Q - 


• 1.00 






Q - 


2. 


00 




Bl 


.60 


5, 


5, "6, 6, 


3 


4; 4, 


5, 5, 


2 


3, 


3, 


4, 4, 


■1 


2, 


2, 


,3, 


3, 






.70 


7, 


7, 8, 8, 


6 


6, 6, 


7, 7, 


5 


5. 


5, 


6, 6, 




4, 


4, 


5, 


5. 


"3 


^ 


.80 


10. 


10,10,10, 


9 


9, 9, 


9, 9, 


8 


.8. 


8, 


8. 8, 


7 


7, 


7; 


"7, 


7, 


6 


B2 


.60 


6. 


6, 6, §, 


5 


5. 5, 


5, 5, 


4 


4, 


4, 


4, 5, 


2 


3, 


3, 


3, 


4, 


1 




.70 


8, 


8, 8, 8, 


, 7 


7, 7, 


7. 7, 


6 


6, 


6, 


6, 6, 


5 




5, 


5, 


6, 


4 


> 


.80 


10, 


io;io,io, 


, 9 


9, 9, 


9, 9, 


. 9 


8, 


8, 


8, 8, 


B 


7, 


7, 


8, 


8. 


7 


B3 


.60 


6, 


6, 7,, 7, 




5| 5, 


6, 6, 


'6 


4, 


4, 


5, 5, 


5 


-3, 


4, 


4, 


4, 


4 




-.70 


3, 


8, 8/ 8, 


% 


7, 7, 


.8, 8, 


7 


6, 


7, 


7,, 7, 


6 


5, 


6, 


6, 


6, 


.6 




.80 


10, 


10,10,10, 


10 


9, 9, 


9, 9, 


, 9 


9, 


, 9, 


9. 9, 


8 


8, 


8, 


a. 


8, 


V 


B4 


.60 


9, 


9, 9, 9, 


. *9 


9; 8, 


8, 8, 




8, 


8, 


7. 7, 


, 7 


7, 


7, 


6, 


6, 


7 


« * 


.70 


10, 


10,10,10, 




10,10, 


10,16, 


.10 


10, 


9, 


9. 9, 


9 


9, 


9, 


8, 


8, 


9 




.80 


10, 


.10,10,10, 




10,10, 


10,10, 


,10 


10, 


.10, 


10, 10, 


10 


10, 


10. 


10, 


10, 


I(? 


B5 


.60 


8, 


.8, 8*, 8, 


, 7 


7, 7, 


7, 7, 


, 6 


6, 


6, 


6, 6, 


, 5 


4, 


5, 


5, 


5, 


4 




.70 


10, 


10, 9, 9, 


JO 


9, 9, 


9, 9, 


. 9 


8, 


8, 


8. 8-, 


5 


7, 


7, 


7, 


7, 


, 7 


* 


.80 


10, 


10,10;10, 


,iapxD, 10, 


10,10, 




10, 


10, 


10.10, 


,JW 


9, 


9, 


9, 


9, 


9 


B6 


.60 


2, 


3, 4, 5, 


, e 1» 2, 


3, 4, 


> s 


1, 


2, 


2. 3, 


, 5 


0, 


1, 


1, 


2, 


, 4 




.70 


5, 


, ^5, 6, 7, 


, s 


^ 3, 4, 


, 5, 6, 


, 7 


2, 


. 3, 


. 4. 5, 






2, 


3, 


4, 


' 6 




.80 


'8, 


8, 9, 9, 




A 7, 


8, 8, 


► 5 


6, 


6, 


7, 7] 


, 5- 


4, 


5, 


6, 


6, 


7 


B7- 


.60 


8, 


. 8, 8, 8, 


, 7 


7, 7, 


. 7, 7, 


, 6 


5, 


6, 


. .6. 6, 


, 5 


4, 


5, 


•5, 


5, 


. 4 




.70 


10, 


,10,10,10, 


, 9 


9, 9, 


% 9, 


, 9- 




. 8, 


, 8.- 8, 


,^ 8 


'7, 


7, 


7, 


7, 


, 7 




.80 


10, 


,10,10,10, 


,10 10,10, 


,10^0, 


,10 


10, 


10, 


,10.10, 


I 10 


10, 


io» 


9, 


9, 


,10 


B8 


.60 


3, 


, 4, 5, 6, 


. 6 


2, 3, 


, 4. 5 


. 5 


2, 


. 2, 


3. 4 


, 5 


1, 


2, 


2, 


3, 


, 4 




.70 


6, 


. 7, '7, 8 




5, 6, 


. 6, 7 


, 7 


4, 


. 5, 


. 5, 6, 


► 6. 


3, 


4, 


4, 


, 5 


, 6 




.80 


9, 


, 9, 9, 9, 


, 9 


8, 3, 


. 9, 9, 


. ^ 


7, 


. 7, 


, 8. 8, 


, 8 


6, 


6, 


7, 


7, 


, 7 


39 


.60 


4, 


. 5. 5. 6 


, 6 


3. 4, 


. 4, 5 




2 


• 3, 


. 3, 4 


, 5 


1, 


, 2, 


3, 


. 3, 


, 4 




.70 


7, 


, 7, 8, 8 




4, 6, 


. 7, 7 


, 7 


4, 


. 5, 


, 6, 6 


, 6 


3, 


. 4, 


5, 


5, 


, 6 




.80 


9 


,10,10,10 


, 9 


9, 9, 


. 9, 9 


, 9 


8, 


, a. 


, 8.^ 8 


, 8 


6, 


7, 


7, 


7 


, 7 


B^O 


.60 


3 


, 4, 5, ^ 


6 


2, 3, 


. 4, 5 


, 5 


1 


• 2, 


, 3. 4, 


, 5 


1, 


. 1. 


2, 


, 3 


, 4 




.70 


6, 


, 7. 7. 8 




5, 6 


, 6, 7 


, 7 


4 


• 4. 


. 5. 6 


, 6 


3, 


. 3, 


4, 


, 5 


, 5 




.80 


9 


. 9. 9, 9 




8, 8 


. 9, 9 




7 


p 7 


}. 3?. 3 




6, 


. 6, 


7, 


, 7 


, 7 



10 ' . . • HUYNH & SAUNDERS 

show that tha two sets of passing sqores are the same, or nearly 
so, as long as the. test score distribution is* reasonably feytnmetric 
(see caaes"B4, B5, and B7). Discrepancies* in these situations 'are 
rarely larger than one unit. Fot laost other situations, the dlf- 
fetrence between the^t^o values for a passing scorja is seldom larger 
than one unit, when the criterion 6 is .70 or^ /SO and when X/v is 
at least .03. The same magpitude of dif ference;^^ ope unit, also 
tends to hcJld at 9*^ • ^60 unlells the teat scores -t^le up at extreme 
values (Case or unless the frequencies ^re fairly irregular 
(Case Bl). 

5 . ADDITOONAL -aATA FOR MODERATELY 

SKEWED DTSTRll^TIONS . * 

' f " ' ' 

Additional cosroari^ons were made for' ten 20-item test-a with 
• •. . . * v. ' ^• 

distributions having- skewneaa ranging from -I. 109 to .117 (see 

Table 4). These ^^testa w?ira afe^embled in the^same way as the 10- 

. item tests described in Section 4. As in the previous ^ectibna, 

the criterion le!vel 9^ waa set at .60, .70, and .80, and the loss 

ratio Q at .25,/ .50, 1.00,' atod 2.00. The p^jrior knowledge about the. 

examinees w&s asstmad to be equivalent- to a ntimber of .items, t, of 

8.875 (X/v ^ .02), 5^,75 (X/v « .03), 4.18.75 (x/v i .04) and- 3. 25 

(X/v - .05). For all the 480 combinations under consideration, the 

' ' ' TABLE 4 

. Frequency Distribution of Scores on Ten CTBS Subtests; 
V Mentioned in Section 5 



' Subtest, 


5 


6 


7 


3 


9 


10 


11 


12 


13 14 15 16 17 18 19 20 


Reading vocabulary 


<• 












1 


1 


5 


3 4 7. 


4 


8 3 4 


Spelling 
















1 


a 


'2 , 3^ . 2 . 
9. 4" 5 


3 


8 12 8 


Science 




1 


• 1 


1 


3 


3 


4 


3 


1" 


2' 


1 r 1 


Spcial studies 


2- 


, 0 


2 


^0 


3- 


^ 1 


2 


2 


6 


9 14 


4 


13 0 


Social studies 




1 


2 


5 


3 


3 


1 


6 


5 


4 2 "2 


5 


0 0 1 


Reading vocabulary 






2 


'0 


0- 


•2 


1 


4 


4 


4 3 4 


8 


3 4 2 


Mathematica cqncepts 


























and application 




1 


0 


0 


1 


2 


^3 


2 


3 


4 0 7 


.7 


2 6 2 


Reading vocabuJLaty 






I 










1-^ 


2 


3 2 5 


■5 


,697 


Social studies 


1- 


3 


1 


1 


1 


0, 


. 2" 


5 


3 


6,3 5 


4 


4 10 


Sciente . . 


1 


1 


4 


2] 


2 


2 


4 


2 


4 


2 3 .4 


3 


5 "0 1 



BAYESIAn' & EMPIRICAL PASSING- SCORES ^ • \ 

■ \ • ,.^ ■ , ' ■• . ,_ A , ■ ' ■ ' 

absolute valu^ of the discrepancies between the twp computed 
pasaixig ^scores are distributed as follows:' 0 (35%)., l„(37%y, 2 
(lp%) , 3 (5%) . and 4 br mote (8%) Hence in ahout three-fourths of 
all sitti^tioiis, the Bayesiatt' and empirical Hayes passing scores do 
not differ from each other by more than one unit. 

'6^ . AGREEMENT OF MASTERY/NONMASTERY DgCISIONS 

Xs noted in Sjection 4, there are situations (such as softfe ^ 
cases associated with the Al, Bl, and B6 data sets) where tjie pass-- 
ing acdrea obtained, from the j3?o methods differ ipprieci^bly . this 
may seem di^Kearteoittg. , However, the procedures provide mastery/ „. 
nonmastery classificatiotts which are in- hi^ agreement for most" 
daaeaf under consideration. For Data Set Al with 9 • .60 and ^70, 
for example, the combined proportions of students' identically clas-^' 
sif iad»-in either the mastery or nonmastery category by the Bfiyesian 
procedure ^wlth X/u - .05) and fey the empirical Bayes procedure' are 
88%, 95%, 99%, and 100% for Q .'25, .50, 1.00, ^jijid 2,00 respect- 
ively. O^er the fifteen data sets of Table '1 and with tl^^ same ^ 
'valued for X/v and Q, the proportions ^of identical clasaiflcatidns 
reach 94%, 96%y 98, and 97% respectively. As for the data of 
Table 4, these proportions stind at 98%, 98%, 98%, and 97%. 
' . .. Though the overall agreement for classifications is high for 
the dat'a considered in thi^ study, some individual cases may show 
less agreement than others. These cases include situations such as 
A2 <*ith S - .60, Q - .25. and X/v - .05 where the^Bayesian passing 
score of . 8 and. the empirical -Bayes passing score df 7 are located 
near the.canter of the tes^. score distribution. The shift of only 
one unit in test score in this case actually cawses 16 students out 
of a total of 80*to-be classified differently by the • two procedures 
Visiblg disagreement between the classificatigps defined by t^e 
Bayes ian and empirical Bayes proc^ures may occur in si?ti>ations 
where scores with high 'frequencies of occurrence are s'elepted as 
the passing scores, rif this is ^ the case, the, pjroportion of stuT 
dents .classified in the mastery (or noninastary) categpry is not i 
likely to Se o±ose- to either 0% or 100%. in other situations where 



12 "^^ . . HUYNH i SAUNDERS 

\ - ^ , % 

iao3t , Students are declared masters (Dat4 ,Set Al with Q^** ^ ^ 

.X/v «:^05, and Q 2.00) or ironmasters (Data* Set A5 with 9^ - •70, 
*X/v ^ ..pSj-and Q * 1.00),' tKe agreement^ in classifications -is \ 

♦ 1 ' ■ * ' * 

aliabst perfect.' • ' ' > ^ * ^ 

7. DISCTJSSION AND' CONCLUSION. 

. ' The results described in praviousp settt^Lons may be siimniarized, 
as fdlloVs:- Xi) Bayesian passing scores and thpse " coarp\it,ed via' the 
empirieal Bayes procedure , are i4ehtical or aimos't identical 4s l,ong 
as^-the t^st spore frequency distribution, is reasonably -synmetric or 

^ when the 'criterion level 8 is * sufficiently high' <.70^^r v 80) ; 

4 O- • » - » 

Xiiy^large; discr^^ may occu? at cr|.terioTi 

'levels of .60- (or Iklpw), edpeciatly wh^n the test scores pile up 
at a few extreme values or when the frequ^tncy distribution is 
irregular; (iii) however^ mastery/nonmaster^ decisions -derived from 
the two procedures are most often identical. Overall, the combixied 
proposition of students similarly classified by both procedures is- 
about 97%.' ' r • 

All in all, there is little difference between the Bayasian 
approach as desctibed b34fewaininathan et al. and the Huynh empitical^ 
Bayes procedure <ie^\:ril)e^ere, ; either in terms of the resulting 
^passing scores ok terms .of the maistery/no^aastery categorization. 

ft-stHJUltrte pointed out that the procediire by Swamihathan et 
-al. relies on "a normal arcsine-squar's-root transformation of the- 
test data and is therefore considered adequate only when the test 
has at least 8 items. In addition, the scheme fequire^ the evalua- 
tion of certain posterior probabilities. This may be done via the 
MARPRO computer program (mentioned in Wang; 19.73) or yia the Wang 
tables* To the chagrin of the writers, many frequency distribu- 
tions such as those derived from the CTBS test data of the South 
C^olina Statewide Testing Program have s values much larger - than 
the upper bound of .05 allowed in the above-mentioned tabled. In 
addition, the coifetraint of having at least 8 itema seems to be 
•quite severe in maiiy practical situations involving objective- 



-BAYESIAN , & Pi\SSIN&' SCOKES . , • • 13 

' • . ■ . . - - .. ..... .* 

referenced "testing. .Such tests* fl^eqnently have 5 orlfewer items ■ 
fer objective. ^ ^•'^ , , . ^ 

■ " . •pie .^lipirical Bayes approach tn its simplest form, aa pre- 
Vented Eu3mh (1976a) , ' requires tj\at the test scbres follow a 

I . . *■•'■■'".>•.♦.• 

Ijeta^bitloinial, distribution. : There' are iiTdtcations ' (Keata & Lord, 
.1962; Duncan^. as 74 i Huynh & Skunders, 1979; al^d^eat Table !•) that 
-t^li^ iijodel aaaqjuately fits mahy ^est score distributipns. Moreover, 

,is\l«iown /j^SubkoViak, 19^; Huynli & Saunders, ' 1979) that, the 
mode?. S§ useful in the Estimations of tl^e reliability of mastery 

/classification based on one.. test administration. In addition, 
■ " '\ ' ' * ; „' -■ ^ ' _ _ _ . 

'usitig th^ en^frit^ai Bayes approachV passing" 

\ *' ' - > 

for test's of any length and 'can be approximated quickly via 

Equation C2). " ' , ^ ; ^ " 

It,iftay be noted that f^fe Bayesian and empirical Bayes proce- 
dures discussed in this paper deal with the setting of passing 
■^^res for a. particxilar test. Both procedures assume the availabil' 
ity.,o£' a minimum mastery or criteifioa level 9 and the availability 

of other. information such as Q, the r stiff, pf the loss incurred by 

. J- . ..... , 

a false positive decision .to that incurred by a falSe negative one- 
in the' context of testing for instructional pui?poses, 9 roay^ be 
based on.. the Judgment of a cutriculum. specialist or a knoVledgeable 

teacjher 'and Q may .be assessed via the time losses encounfered by a 

■ ■ ♦ ^- ' . ■ '■ ■ 

misdecision (Huynh^ 1976a). The issue is much' more involved for 
ehd-of-program certif ication, such as high^chool graduation (mini-, 
mum coiSipe^ency) t^^ting prograsns legislated in several states. The 
reader.is referred to Jaeger (1976) x and Shepard ■ (1976) for insight 
regarding some of these issues.. 

fhe empirical' Baye? approach with the availability of a pre- 
determined ^criterion level, . hoWever, is only ^he. simplest form of 
the general "framework ^f mastery evaluatfon as approached by Hiiynh 
.(1976a)>.* The easet^tial component of this, model is an external task 
(real, or hypothetical)" that examinees are "Supposed to perform once 
they are granted mastery of the objictives or conCenL upon which a 
test is based. Such an external task may be i^^ntified in the 
cdrftexi; of instruction, especially wijen instructional units are 



14 * 'V HIIYOT &'SAUNDERS 



sequenced fn some logical order • If this requirement is fulfilled, 
t^e specification of ' 9^ 'is no longer necjessary. Some suggestions 
^ f6r solutions along this line have been presented elsewhere .(Hi^ynh^ 
1976a, p. 73-Z5; Huynh, 1977; Huynh & Pemey, 1979). To the 
knol^ledge of the writers, the Bayesian approach as presented by 
Swaminathan.et al« has not been generalized to' situations other 
l^han those involving constant losses and when a criterion level is 

available,. Although such a generalization may be made, the numer- 

^ ' * # ■ ♦ 

ical analysis would be more ^involved than can be expected from the 
empirical Bayes approach. 

studied in this paper are based on group data and therefore are 
appropriate to the extent that minimization of loss is considered 
for the entire group of exaininees. Ttjis may be the case for mini- . 
mum competency testing where resources for remedial^ instruction are 
limited* Procedures relating ^o standard^ setting in the absence of 
group data are available (see, for example, Huynh, 1978) • 

In concl^aion, the ^empiyical Bayes approach yields , mastery/ 
nonmastery decisions identical ^n mo^t cased to those based on the- 
B^esian approach. In addition, the former approach is simpler in 
^' terms of computations, is applicable to any test length, and has 
been generalized to more complex testing .situations. - 

BIBLIOGRAPHY 

, — . ^ . ^ 

. Somprehensive Tests of Basic Skills^ Level C (1973) • Monterey, 
] California: CTB/M<;Graw-atill . 

Duncan^ Gw T. (1974) . An empirical Bayes approach to scoring 
multiple-choice test^ in the misinformation model. Journal of 
the American. Stat:istical Association 69 > 50-57 . 

Huynh, H.- (1976a). Statistical consideration of mastery scores. 
' Psychometrika 41 , *65-78. 

'Hu3rtth, H. (1976b). On mastery scores and efficiency of criterion- 
reference^ tests when Ipjsses are partially known. Paper pre- 
setited at the ^nual meeting of .the .American Educational 
.Research Association, San Francisco, April 19-23. 



15 



BAYESIAN & EMPIRICAL PASSING SCORES 15 

' ■ *' . * 

/ . 

t ^ . * 

Huynh, H. (1977). Two simple classes of mastery s*cores based on 
the bfita-binomisfl moSel. Psychometrlka 42 . 601-608. 

Uuynh, H. (1978). A nonranddmized min^iliax solution^or mastery ' 
scores, in the binomial error model.' Research Memorandum 78-^2 , 
Publicatiorr Series in Mastery Testing . University of South ' 
paro;iina- College' of , Education. 

duynh, H. & Perney^ J. C. (1979). Determination of, mastery scores 
when iris tr actional units are linearly related. Educational and 
* Psychological Measurement 39 , 317-325.- .. '""^ 

Huynh, H. & Sounders, J. C. (1979). ^Accuracy of two procedures" for 
estimating reliability, of " mastery tests. Research Memorandua 
79-1 , Publication ^eries.in Mastery Testing . University of 

— _ SQutb-CarQllna ilollege of Mucat iqn Also presented at the 

annual conference of the Eastern Education Research Association, 
Kiawah Island, South Carolina, February 22-24, 1979, 

IBM Application Program, System/360 (1971) . Scientifdc» subrdutlnes . 

• package (3B0-CM-03X) Version III, Programmer's mans^al . White 
. Plains, New Yor^c: IBM 4:orporation Technical Publicatibn^. 

Department. , . • . 

IMSL I^rary 1 (1977). Houston: International Mathematical and 
atistipal Libraries. ' , . 

Jae'ger, R,, M. '<1976) . Measurement consequences of selected 
standard-setting models. Florida Journal of Educational 
Researlsh 18 , X2-27. 

Keats, J. A. & Lord, F. M. (1962). A theoretical distribution for 
mental test scores. Psydhometrjka 27 , 59-72. 

No-^^ick, M. R. , Lewis, C. & Jackson, P. H. (1973). The estimation 
of proportions- in m groups. Psychometrika 3g , 19-45. 

Pearson, K. (1934). Tables of the Incomplete Beta Function . 
Cambridge: ' University Press. 

Shepard-, L. A. (1976). Setting standards and living with them. 
Florida Journal of Education Research 18 , 23-32. 

Subkoviak," M. J. (1978). Empirical investigation of procedures for 
estimating reliability of.mastefry tests. Journal of Educational 
^ Measurement 15 , 111-116. 

Swaminathan, H., Hambleton, R. K. & Algina, J. (1975). A Bayesian 
decision- theoretic procedure for use with criterion-ref erejiced 
tests. . Journal of Educational Measurement 12 , 87-98. 



16 



HUYNH & SAUNDERS 



Wang, M. M. (1973). Tables of constants for the posterior jnarginal 
'estimates ot proportions in m groups* ACT Technical Bulletin 
No. 14 . . Iowa City, lowk: The American College Testing Program. 

^ ; FIGURE I . • • ^ . - 

Four Categories^ of Classification 
Basad^on Two Test Administratioyis 



^s. First 

Testing 

Second 

Testing:*. ^ 
* . ^ 


^ ^ ^ 

Nonmastery 
— \ 


Mastery 


I^s^ery 


Nonmastery-x^ * 
Mastery^ I 


Hip|-tn lir1-l TT— • -* ' -..tm^^K.-i— - 

Mastery- 
Mastery 
(consistent 
decision) 


♦ \ 

Nonmastery 


Nonmastery- 
. Nonmastery 
• (cx)nsistent 
decision) 


Mastery- • 
Nonmastery . 



• ' ACKNOWLEDGEMENT 

This work was performed pursuant to Grant NIE-G-78-0087 with the 
National Institute of Education, Department of Health, Education, 
and Welfare, Huynh Huynh, Ptincipal Investigator. Points of view or 
opinions stated do not neccessarily reflect NIE positions or policy 
and no endorsement should be inferred. The editorial assistance of 
Anthony J. Nitko is gratefully acknowledged. 



'17 



